首页> 外文OA文献 >Classifying document types to enhance search and recommendations in digital libraries
【2h】

Classifying document types to enhance search and recommendations in digital libraries

机译:对文档类型进行分类以增强搜索和建议   数字图书馆

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

In this paper, we address the problem of classifying documents available fromthe global network of (open access) repositories according to their type. Weshow that the metadata provided by repositories enabling us to distinguishresearch papers, thesis and slides are missing in over 60% of cases. Whilethese metadata describing document types are useful in a variety of scenariosranging from research analytics to improving search and recommender (SR)systems, this problem has not yet been sufficiently addressed in the context ofthe repositories infrastructure. We have developed a new approach forclassifying document types using supervised machine learning based exclusivelyon text specific features. We achieve 0.96 F1-score using the random forest andAdaboost classifiers, which are the best performing models on our data. Byanalysing the SR system logs of the CORE [1] digital library aggregator, weshow that users are an order of magnitude more likely to click on researchpapers and thesis than on slides. This suggests that using document types as afeature for ranking/filtering SR results in digital libraries has the potentialto improve user experience.
机译:在本文中,我们解决了根据文档类型对从(开放访问)存储库的全球网络中可用的文档进行分类的问题。我们显示,存储库提供的元数据使我们能够区分研究论文,论文和幻灯片,超过60%的案例都缺少这些元数据。尽管这些描述文档类型的元数据在从研究分析到改进搜索和推荐(SR)系统的各种情况下都是有用的,但在存储库基础结构的上下文中,尚未充分解决此问题。我们已经开发了一种新方法,该方法使用专门基于文本特定功能的有监督的机器学习对文档类型进行分类。我们使用随机森林和Adaboost分类器获得0.96 F1得分,这是我们数据中表现最好的模型。通过分析CORE [1]数字图书馆聚合器的SR系统日志,我们显示用户单击研究论文和论文的可能性比幻灯片要高一个数量级。这表明使用文档类型作为功能来对数字图书馆中的SR结果进行排名/过滤具有改善用户体验的潜力。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号